Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Summus Aristoteles Date: Jul 26, 2010 5:26am
Forum: texts Subject: Re: Out of copyright book scanned in British library - ok to upload?

I assume these watermarks are placed directly on images (and raster), if watermarks are vector text (for instance a watermark on all pages in a pdf that serves as container for images) then another tecnique can be used to remove

- take the pdf
- uncompress streams inside (this can be done with pdftk for example:

pdftk input.pdf output uncompressed.pdf uncompress

- now we are looking for text string of watermark (if watermark display "British Library", we will search for "British Library")

this can be done (without directly open the uncompressed pdf) with an editor like sed (in windows and linux both)

sed -e "s/watemark text/ /g" unwatermarked.pdf

now we re-compress the resulting pdf (with pdftk)

pdftk unwatermarked.pdf output compressed.pdf compress

in order to rebuild the XREF table another step is optionally performed

pdftk compressed.pdf output compressedfixed.pdf

if Bob can give us further details can help better